15 research outputs found

    Direction of Arrival with One Microphone, a few LEGOs, and Non-Negative Matrix Factorization

    Get PDF
    Conventional approaches to sound source localization require at least two microphones. It is known, however, that people with unilateral hearing loss can also localize sounds. Monaural localization is possible thanks to the scattering by the head, though it hinges on learning the spectra of the various sources. We take inspiration from this human ability to propose algorithms for accurate sound source localization using a single microphone embedded in an arbitrary scattering structure. The structure modifies the frequency response of the microphone in a direction-dependent way giving each direction a signature. While knowing those signatures is sufficient to localize sources of white noise, localizing speech is much more challenging: it is an ill-posed inverse problem which we regularize by prior knowledge in the form of learned non-negative dictionaries. We demonstrate a monaural speech localization algorithm based on non-negative matrix factorization that does not depend on sophisticated, designed scatterers. In fact, we show experimental results with ad hoc scatterers made of LEGO bricks. Even with these rudimentary structures we can accurately localize arbitrary speakers; that is, we do not need to learn the dictionary for the particular speaker to be localized. Finally, we discuss multi-source localization and the related limitations of our approach.Comment: This article has been accepted for publication in IEEE/ACM Transactions on Audio, Speech, and Language processing (TASLP

    Acoustic DoA Estimation by One Unsophisticated Sensor

    Get PDF
    We show how introducing known scattering can be used in direction of arrival estimation by a single sensor. We first present an analysis of the geometry of the underlying measurement space and show how it enables localizing white sources. Then, we extend the solution to more challenging non-white sources like speech by including a source model and considering convex relaxations with group sparsity penalties. We conclude with numerical simulations using an unsophisticated sensing device to validate the theory

    Audio Novelty-Based Segmentation of Music Concerts

    Get PDF
    The Swiss Federal Institute of Technology in Lausanne (EPFL) is in the process of digitizing an exceptional collection of audio and video recordings of the Montreux Jazz Festival (MJF) concerts. Since 1967, five thousand hours of both audio and video have been recorded with about 60% digitized so far. In order to make these archives easily manageable, ensure the correctness of the supplied metadata, and facilitate copyright management, one of the desired tasks is to know exactly how many songs are present in a given concert, and identify them individually, even in very problematic cases (such as medleys or long improvisational periods). However, due to the sheer amount of recordings to process, it is a quite cumbersome and time consuming task to have a person listen to each concert and identify every song. Consequently, it is essential to automate the process. To that end, this paper describes a strategy for automatically detecting the most important changes in an audio file of concert; for MJF concerts, those changes correspond to song transitions, interludes, or applause. The presented method belongs to the family of audio novelty-based segmentation methods. The general idea is to first divide a whole concert into short frames, each of a few milliseconds length, from which well-chosen audio features are extracted. Then, a similarity matrix is computed which provides information about the similarities between each pair of frames. Next, a kernel is correlated along the diagonal of the similarity matrix to determine the audio novelty scores. Finally, peak detection is used to find significant peaks in the scores which are suggestive of a change. The main advantage of such a method is that no training step is required as opposed to most of the classical segmentation algorithms. Additionally, relatively few audio features are needed which leads to a reduction in the amount of computation and run time. It is expected that such a preprocessing shall speed up the song identification process: instead of having to listen to hours of music, the algorithm will produce markings to indicate where to start listening. The presented method is evaluated using real concert recordings that have been segmented by hand; and its performance is compared to the state-of-the-art

    In Vivo Visualization of Hair Follicles by Ultrasound Biomicroscopy in Alopecia Areata and its Correlation with Histopathology

    Get PDF
    Ultrasound biomicroscopy (UBM) is a non-invasive imaging technique used in examination of several skin diseases but never in imaging hair and scalp diseases.  Main objective of this investigation was assessment of the efficacy of UBM for in vivo visualization of hair follicles in cases of alopecia areata (AA) and correlation of findings with histopathological findings. This study included 30 patients with AA. Two areas, one with AA and a control area, were marked, examined by UBM and then biopsied for histopathological examination. In patients with alopecia totalis (AT) or universalis (AU) only an AA area was examined. Non-echogenic conical shadows reaching the epidermal entrance echo (probably corresponding to the hair follicles) were seen and were wider and fewer in number in areas of AA than in normal control areas. No significant difference was found regarding number and width of hair follicles between UBM and histopathological examination. However, a significant increase in length of follicles in histopathology was detected, indicating that the UBM image was probably unable to reach the deepest part of the follicle. Main limitation of the study is small number of cases. No significant difference was found between UBM and histological measurements of hair follicle number and width in patients with AA, making UBM a useful tool for in vivo visualization of hair follicles. </p

    Diagnostic and prognostic impact of E6/E7 mRNA compared to HPV DNA and p16 expression in head and neck cancers: an Egyptian study

    Get PDF
    Introduction: Human papillomavirus (HPV) is identified as a culprit in a subset of head and neck squamous cell carcinomas (HNSCCs). The clinicopathologic profile displayed by this subset diverges from that of HPV-negative HNSCCs. Despite a variety of available tests, there is no consensus on which technique is the best for detection of HPV in HNSCCs. Although this field has received substantial interest within different continents, African and Egyptian populations are not yet well studied within the literature. Methods: This cross-sectional study was carried out to detect HPV prevalence in HNSSC and to correlate the viral prevalence with different clinicopathologic parameters as well as with the patients’ outcome. For 51 patients with HNSCC, HPV-16 DNA was determined via PCR, while E6/ E7 mRNA was detected employing real-time PCR. Immunohistochemistry (IHC) was performed to assess p16 status. Results: P16 was overexpressed in 49% of cases, while HPV-16 DNA was detected in 52.9% of cases, and likewise, E6/E7 mRNA was found in 52.9% of cases. There was a very good agreement between HPV16 DNA and RNA results (κ = 0.843, P-value &lt;0.001). Meanwhile, a good agreement was revealed between HPV16 DNA and p16 IHC results (κ = 0.608, P-value &lt;0.001). Similarly, there was a good agreement between HPV RNA results and p16 IHC results (κ = 0.608, P-value &lt;0.001). By the end of the study period, 13.7% of the enrolled patients died, with the overall survival of the studied patients being 17.29 months. Of note, there was no statistically significant correlation between the overall survival and HPV status. Conclusion: The present study highlights the significant role played by HPV in HNSCC. Furthermore, it reveals that although p16 has been a marker of HPV existence in HNSCC, it should not be the sole determinant of HPV role in tumorigenesis

    In Vivo Visualization of Hair Follicles by Ultrasound Biomicroscopy in Alopecia Areata and its Correlation with Histopathology

    Get PDF
    Ultrasound biomicroscopy (UBM) is a non-invasive imaging technique used in examination of several skin diseases but never in imaging hair and scalp diseases.  Main objective of this investigation was assessment of the efficacy of UBM for in vivo visualization of hair follicles in cases of alopecia areata (AA) and correlation of findings with histopathological findings. This study included 30 patients with AA. Two areas, one with AA and a control area, were marked, examined by UBM and then biopsied for histopathological examination. In patients with alopecia totalis (AT) or universalis (AU) only an AA area was examined. Non-echogenic conical shadows reaching the epidermal entrance echo (probably corresponding to the hair follicles) were seen and were wider and fewer in number in areas of AA than in normal control areas. No significant difference was found regarding number and width of hair follicles between UBM and histopathological examination. However, a significant increase in length of follicles in histopathology was detected, indicating that the UBM image was probably unable to reach the deepest part of the follicle. Main limitation of the study is small number of cases. No significant difference was found between UBM and histological measurements of hair follicle number and width in patients with AA, making UBM a useful tool for in vivo visualization of hair follicles. </p

    AudioPaLM: A Large Language Model That Can Speak and Listen

    Full text link
    We introduce AudioPaLM, a large language model for speech understanding and generation. AudioPaLM fuses text-based and speech-based language models, PaLM-2 [Anil et al., 2023] and AudioLM [Borsos et al., 2022], into a unified multimodal architecture that can process and generate text and speech with applications including speech recognition and speech-to-speech translation. AudioPaLM inherits the capability to preserve paralinguistic information such as speaker identity and intonation from AudioLM and the linguistic knowledge present only in text large language models such as PaLM-2. We demonstrate that initializing AudioPaLM with the weights of a text-only large language model improves speech processing, successfully leveraging the larger quantity of text training data used in pretraining to assist with the speech tasks. The resulting model significantly outperforms existing systems for speech translation tasks and has the ability to perform zero-shot speech-to-text translation for many languages for which input/target language combinations were not seen in training. AudioPaLM also demonstrates features of audio language models, such as transferring a voice across languages based on a short spoken prompt. We release examples of our method at https://google-research.github.io/seanet/audiopalm/examplesComment: Technical repor

    Good Vibrations and Unknown Excitations

    No full text
    Inspired by the human ability to localize sounds, even with only one ear, as well as to recognize objects using active echolocation, we investigate the role of sound scattering and prior knowledge in regularizing ill-posed inverse problems in acoustics. In particular, we study direction of arrival estimation with one microphone, acoustic imaging with a small number of microphones, and microphone array localization. Not only are these problems ill-posed but also non-convex in the variables of interest when formulated as optimization problems. To restore well-posedness, we thus use sound scattering which we construe as a physical form of regularization. We additionally use standard regularization in the form of appropriate priors on the variables. The non-convexity is then handled with tools such as linearization or semidefinite relaxation. We begin with direction of arrival estimation. While conventional approaches require at least two microphones, we show how to estimate the direction of one or more sound sources using only one. This is made possible thanks to regularization by sound scattering which we achieve by compact structures made from LEGO that scatter the sound in a direction-dependent manner. We also impose a prior on the source spectra where we assume they can be sparsely represented in a learned dictionary. Using algorithms based on non-negative matrix factorization, we show how to use the LEGO devices and a speaker-independent dictionary to successfully localize one or two simultaneous speakers. Next, we study acoustic imaging of 2D shapes using a small number of microphones. Unlike in echolocation where the source is known, we show how to image an unknown object using an unknown source. In this case, we enforce a prior on the object using a total variation norm penalty but no priors on the source. We also show how to use microphones embedded in the ears of a dummy head to benefit from the diversity encoded in the head-related transfer function. We then propose an algorithm to jointly reconstruct the shape of the object and the sound source spectrum. We demonstrate the effectiveness of our approach using numerical and real experiments with speech and noise sources. Finally, the need to know the microphone positions in acoustic imaging and a number of other applications led us to study microphone localization. We assume the positions of the loudspeakers are also unknown and that all devices are not synchronized. In this case, the times of arrival from the loudspeakers to the microphones are shifted by unknown source emission times and unknown sensor capture times. We thus propose an objective that is timing-invariant allowing us to localize the setup without first having to estimate the unknown timing information. We also propose an approach to handle missing data as well as show how to include side information such as knowledge of some of the distances between the devices. We derive a semidefinite relaxation of the objective which provides a good initialization to a subsequent refinement using the Levenberg-Marquardt algorithm. Using numerical and real experiments, we show we can localize unsynchronized devices even in near-minimal configurations

    Circulating level of interleukin-6 in relation to body mass indices and lipid profile in Egyptian adults with overweight and obesity

    No full text
    Abstract Background Obesity is an important feature of metabolic syndrome, and the link between them has been attributed to the state of chronic inflammatory process. The purpose of the study is to investigate the relation of circulating level of IL-6 as an inflammatory cytokine to body mass index and lipid profile in adults with overweight and obesity. Methods This cross-sectional study included 15 adults with overweight, 45 with obesity (15 grade I, 15 grade II, and 15 grade III), and 25 average weight controls. Circulating IL-6 level and lipid profile were measured. Results Highly significant differences were found between study groups in different grades of obesity as regards weight, body mass index, serum triglycerides, and serum LDL-C. Circulating levels of IL6 were significantly higher in subjects with overweight and obesity. There were significantly positive correlations between circulating levels of IL6 and BMI in subjects with grade III obesity and negative correlation with serum HDL-C in subjects with grade II obesity. Conclusion High circulating level of IL-6 could reflect the intensity of the chronic and systemic inflammation that develops with high degrees of obesity, which might contribute to the development of atherosclerosis and coronary heart diseases, both directly and by reducing HDL-C levels

    Localizing Unsynchronized Sensors with Unknown Sources

    Full text link
    We propose a method for sensor array self-localization using a set of sources at unknown locations. The sources produce signals whose times of arrival are registered at the sensors. We look at the general case where neither the emission times of the sources nor the reference time frames of the receivers are known. Unlike previous work, our method directly recovers the array geometry, instead of first estimating the timing information. The key component is a new loss function which is insensitive to the unknown timings. We cast the problem as a minimization of a non-convex functional of the Euclidean distance matrix of microphones and sources subject to certain non-convex constraints. After convexification, we obtain a semidefinite relaxation which gives an approximate solution; subsequent refinement on the proposed loss via the Levenberg-Marquardt scheme gives the final locations. Our method achieves state-of-the-art performance in terms of reconstruction accuracy, speed, and the ability to work with a small number of sources and receivers. It can also handle missing measurements and exploit prior geometric and temporal knowledge, for example if either the receiver offsets or the emission times are known, or if the array contains compact subarrays with known geometry
    corecore